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ABSTRACT 

The halo approach to large scale structure provides a physically motivated model to understand 
clustering properties of galaxies. An important aspect of the halo model involves a description on how 
galaxies populate dark matter halos or what is now called the halo occupation distribution. We discuss 
a way in which clustering information, especially in the non-linear regime, can be used to determine 
moments of this halo occupation number. We invert the non-linear part of the real space power spectrum 
from the PSCz survey to determine the second moment of the halo occupation distribution in a model 
independent manner. The precise measurement of higher order correlations can eventually be used to 
determine successive higher order moments of this distribution. 

Subject headings: 



1. INTRODUCTION 

The halo approach to large scale structure has now be- 
come a useful tool to study and understand clustering 
properties of dark matter and a nu mber of tracers includ- 
ing galaxies (see Cooray & Sheth 2002 for a recent re- 
view) . This approach replaces the complex distribution of 
dark matter with a collection of collapsed dark matter ha- 
los. Thus, necessary inputs for a halo based model include 
properties of this dark matter halo population, such as its 
mass funct ion and the s patial prof ile of dark matter within 



each halo (peljakl 2 000; |Ma fc Fryj 2000; pcoccimarro et al 
2000; |Cooray et &\ 2000) 



In order to describe clustering aspects beyond dark mat- 
ter, it is necessary that one understands how the tracer 
property is related to the dark matter distribution in each 
halo. In the case of galaxies, an important input is a de- 
scription on how galaxies populate halos. This is usually 
achieved by the so-called halo occupation number where 
one describes the mean number of galaxies in dark matter 
halos a s a fu nction of mass and its hi gher order moments 
flSeljakj 2000; |Peacock fc Smith| 2000; [Berlind fc Weinberg 
2002). For the two-point correlation function, one requires 
information up to the second moment of the halo occupa- 
tion distribution, while higher order correlations succes- 
sively depend on increasing moments. 

The halo occupation distribution has been widely dis- 
cussed in the literature in terms of semi-analytical models 
of galaxy formation (e.g., Benson et al. 2001; Somerville 
et al. 2001). With the advent of a well defined halo ap- 
proach to clustering, observational cons traint s have also 
begun to app ear (e.g., |Scoccimarro et alj 2000; |Moustakas 



fc Somerville 2002). While expectations for constraints on 



the halo occupation distribution from current wide-field 
galax y surveys, such as th e Slo an Digital Sky Survey, are 
high ( Berlind fc Weinberg 2002; 5cranton| 2002), these are, 
however, all considered in a model dependent manner. 

Though descriptions on the halo occupation number 
based on a specific model is useful, it is probably more 
useful to consider model independent constraints. In this 
Letter, we consider such an approach and suggest that clus- 
tering information, especially in the non-linear regime, can 
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be used for a reconstruction of various moments of the 
halo occupation number. We discuss a possible inversion 
for this purpose and use results on the non-linear power 
spectrum from the PSCz redshift survey (Saunders et al. 
2000) by Hamilton & Tegmark (2002; see also, Hamilton et 
al. 2000) to provide a first estimate of the second moment 
of the halo occupation number. We discuss both strengths 
and limitations of our approach and provide a comparison 
to model based descriptions of the halo occupation num- 
ber. 

We provide a general discussion of our method in the 
next section, when illustrating results in § 3, we take a flat 
ACDM cosmology with parameters ft c = 0.3, Oj, = 0.05, 
n A = 0.65, h = 0.65, n = l,S H = 4.2 x 10~ 5 . 

2. HALO APPROACH TO GALAXY CLUSTERING 

The halo approach to galaxy clustering assumes that 
large scale structure can be described by a distribution of 
dark matter halos while galaxies themselves form within 
these halos. At the two-point level of correlations, the con- 
tribution can be written as a sum of correlations of galax- 
ies that occupy two different halos and the correlation of 
galaxies within a single halo. These two terms are gen- 
erally identified in the literature as 2- and 1-halo terms, 
respectively. Under the assumption that halos trace the 
linear density field, the 2-halo term is proportional to the 
linear power spectrum with a bias f actor that can be cal- 
culated from analytical arguments ( Mo et a] 199 7). The 
mass func tion, such as the Press-Schechter (PS; Press & 
Schcchteij 1974), and the distribution of dark matter in 
halos, such as the NFW profile of Navarro et al. (1996), 
are known from either analytical or numerical techniques. 

Assuming a halo population with a mass function given 
by n(m), and that galaxies trace the dark matter in each 
halo randomly, we can write the total contribution to the 
power spectrum of galaxies as 
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Fig. 1. — |j/(fc|m)| 2 as a function of mass for three different values 
of k, as written on the plot, at redshift of zero. We assume dark 
matter profiles under the description of Navarro et al. (NFW; 1996). 



Here, y(k\m) represents the three-dimensional Fourier 
transform of the dark matter profile. In general, the 1- 
halo term dominates the total contribution to the power 
spectrum in the non-linear regime while the 2-halo term 
captures the large scale correlations in the linear regime. 

In equation |l|, (iV ga i|m) and {N ga i(N ga i — 1)|to) are the 
first and second moments of the galaxy occupation distri- 
bution, respectively, n ga i is mean number density of galax- 
ies, 
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bi(m) is the first order halo bias (Mo et al. 1997), and 
P hn (k) is the linear power spectrum of the density field. 
On large scales where the two-halo term dominates and 
y(k\m) — > 1, the galaxy power spectrum simplifies to 
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represents the large scale constant bias factor of the galaxy 
population. Though we have not considered, in detail, 
there are slight modifications to equation (1) especially 
if one is interested in accounting for possibilities such as 
a single galaxy always form at the center of each dark 
matter halo and the fact that galaxies may not trace the 
dark matter perfectly. 

At non-linear scales, y(k\m) <C 1, and the 1-halo 
term dominates. Since y(k\m) is no longer constant, 
if y{k\m) is assumed a priori from a certain dark mat- 
ter profile, we can consider a possible inversion of the 
power spectrum measurements to reconstruct /(to) = 
n(m) (iVg a i(7Vg a i — l)|m) /n gal - With information related 
to the halo mass function and mean density of galaxies, 
which comes directly from data, one can obtain a model 
independent estimate of {N ga \(N ga i — l)|m). Note that in 
the large scale regime, because of the scale-independent 
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Fig. 2. — The second moment of the halo occupation number, 
plotted as »J {^N^^N^i — l)|m), as a function of halo mass m. The 



two color error boxes show two indepdent sets of estimates with bins 
of one set shifted relative to those of the other. For comparison, we 
show three estimates on the second moment of the halo occupation: 
The blue- (solid) and red-galaxy (dotted) model of Scranton (2002) 
and the best fit blue galaxy model of Scoccimarro et al. (2000) 
based on APM data (dot-dashed line) with normalization following 
Sheth & Diaferio (2001). 



constant bias, the equation is non-invertible. Therefore, 
clustering information cannot be used reconstruct the 
mean of the halo occupation number appropriately. 

Though our use of outside knowledge on y{k\m) and 
n(m) may make the extraction of the second moment 
model dependent, one should note that these are effec- 
tively properties of the dark matter and are determined 
well from N-body numerical simulations. The fact that we 
have a potential method to estimate information on the 
galaxy side, mainly details on the halo occupation number 
without resorting to any models, is extremely important. 
This is the main result of this paper. We now consider 
the possibility for an inversion and an application of our 
suggestion to galaxy clustering data at non-linear scales 
from the PSCz survey (Saunders et al. 2000). 

3. INVERSION 

In order to invert the non-linear power spectrum to es- 
timate information related to second moment of the halo 
occupation number, we follow standard approaches in the 
literature related to inversions associated with large scale 
structure clustering (e.g., Dodelson et al 2001; Cooray 
2002). These inversions are usually considered to re- 
cover three-dimensional clustering information from two- 
dimensional angular clustering data. The inversion prob- 
lem here is similar: Given estimates of P ga \(k), and in- 
formation related y(k\m) 7 we would like to estimate the 
function /(to) defined earlier. 

We can write the associated inversion equation as 



P = YAI 



(5) 



where, P is a vector containing the data related to P? a \(k) 
with an associated noise vector n, and Y is a matrix con- 



3 



taming kernels at each fcj, where the non- linear power 
spectrum is measured, and at each m„ for which galaxy 
weighted mass function information is desired. The in- 
version problem stated in terms of this equation involves 
estimating M given other vectors and the matrix Y. Note 
that the matrix Y differs from kernels defined by |y(/c|m) | 2 
due to an additional factor of dm. By appropriately renor- 
malizing equation (Q) with noise, following Dodelson et al. 
(2001), we can consider the minimum variance estimate 
of f(m). We refer the reader to Dodelson et al. (2001) 
for full details of the inversion. Additional discussions on 



relate d inv ersion te chniques are available in Dodelson 
Gaztanaga] (2000), Eisenstein & Zaldarriaga (2001) an< 
references therein. 

In figure 1, we show |y(fc|m)| 2 as a function of halo mass, 
m, for three different values of k and at a redshift of zero. 
The behavior of these kernels are important for the inver- 
sion since they determine how likely the inversion will be 
stable and to what extent information can be extracted 
on /(p). The effective width of \y(k\m)\ 2 increases with 



decre asing k similar to the behavior one observes from 
kernels associated the inversion of a simple galaxy corre- 
lation function. The kernels in the latter case involves a 
Jo j the zeroth order Bessel function, which when plotted 
looks similar in shape to associated kernels here, except for 
the ringing part associated with Jo functions that oscillate 
between positive and negative values. 

The general behavior of present kernels, as a function 
of k and m, is also consistent with how the halo model 
contributes to the non-linear power spectrum: at small 
physical scales, contributions come from the low mass ha- 
los while at large physical scales or small k values, contri- 
butions to the power spectrum come from the whole mass 
range. The overlap of kernels with decreasing k suggests 
that any estimates of the non-linear power spectrum is 
likely to be highly correlated. This is, in fact, true when 
one considers the covariance resulting from the four-point 
co rrela tions function or the trispectrum (e.g., Cooray & 
Hu | 20p l). Thus, any measurement of the non-linear power 
spectrum should be considered with its associated covari- 
ance matrix and correlations should be properly accounted 
when inverting the power spectrum to determine any phys- 
ical property, such as f(m) defined above. 

In terms of published analyses of clustering, this infor- 
mation, unfortunately, is not fully available to us from the 
literature. The best published estimate of the non-linear 
galaxy power spectrum, so far, comes from the PSCz sur- 
vey by Hamilton & Tegmark (2002). Given lack of knowl- 
edge on its covariance, as advised by one of the authors, 
we used the "prewhitened" power spectrum estimated by 
the same authors to carry out an inversion as part of this 
study. It is suggested in Hamilton & Tegmark (2002) that 
estimates of the prewhitened power spectrum are decor- 
related and we used estimates and their errors as part of 
this inversion without accounting for the correlations. 

For the purpose of this inversion, we define the non- 
linear part as the power spectrum from k > 1 h Mpc . 
At small k values, we expect a fractional contribution from 
the 2-halo term, or the correlated part between galaxies 
in different halos. This contribution, however, decreases 
rapidly with increasing value for k. We use power spec- 
trum estimates out to k ~ 300 h Mpc -1 , though, estimates 



beyond ~ 100 h Mpc -1 from the PSCz survey are noise 
dominated. Following the redshift distribution of PSCz as 
measured by Saunders et al. (2000), we assume a redshift 
of 0.028 for the full non-linear power spectrum. It is likely 
that estimates of the non-linear power are redshift depen- 
dent (Scranton & Dodelson 2000), however, we have no 
useful information to account for such variations. 

When converting /(to) estimates to the second moment 
of the halo occupation number, we require information on 
the number density of PSCz galaxies at the mean redshift 
of the estimated non-linear power spectrum. We obtained 
this information using the luminosity function of PSCz 
galaxies as calculated by Seaborne et al. (1999; see also 
Saunders et al. 1990), but we have not attempted to ac- 
count for any variations on the density from the original 
PSCz catalog to the one utilized in estimating the power 
spectrum by Hamilton & Tegmark (2000). We expect any 
changes on this aspect, however, if any at all, to be mi- 
nor. Also, in order to convert /(to) to the second moment, 
we assume the halo mass function is described by Press & 



Schcchter| (1974) theory. Again, we make no attempt to 



incorporate any uncertainties in the mass function. 

We summarize our results on the second moment of the 
halo occupation distribution in figure 2. Here, we plot 

1 /2 

( Agai ( -/Vgai — 1)|to) ' as a function of the halo mass to. 
We present two separate sets of estimates of this quantity 
with bands, on mass axis, of the second set shifted relative 
to the bands of the first set. This shifting demonstrates 
the robustness of the inversion and follows from the now 
well known analysis technique introduced by the cosmic 
microwave background experimentalists. Note that our 
estimates are likely to be contaminated by assumptions 
used in the analysis. One of the main drawbacks is the lack 
of an accounting of full covariance, or associated correla- 
tions, between power spectrum measurements. Another is 
the assumption that galaxies are randomly distributed in 
each halo with no preference to be at the halo center. The 
latter assumption is likely to affect the estimates at the 

1 /2 

low mass end especially when (N ga i (N ga i — l)\m) ' goes 
below 1. Since we find this to be the case only in our first 
bins, we have not attempted to correct for this any further. 

The second moment of the halo occupation number esti- 
mated here should be considered as the value at the mean 
redshift of the PSCz catalog. The estimates clearly show 
the power-law behavior of the halo occupation number. If 
the distribution can be described by a Poisson distribution, 

1 1% 

then we can write (N ga i(N ga i — l)\m) ' = (N ga \\m). For 
comparison, we also plot predictions for the mean num- 
ber, (iV ga i|m), following various analyses in the literature. 
These include model descriptions by Scranton (2002) using 
results from semi-analytical models of galaxy formation 
and from Scoccimarro et al. (2000) from power law fits 
to the APM clustering data. The latter can be described 
as (N ga \\m) = ^4(to/too) ' 8 . We normalize this curve us- 
ing too = 4 x 10 12 /i -1 M Q and A = 0.7 following Sheth & 
Diaferio (2001). 

At intermediate mass ranges and below, estimated band 
powers generally fall above these descriptions suggesting 

1/2 

that (Ag a i(iVg a i — l)|m) ' may follow as a(m) (N ga i\m). 
There are descriptive models available in the literature for 
a(m) (Scoccimarro et al. 2000; Scranton 2002). These, 
however, indicate a(m) < 1 such that the halo occupa- 
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Fig. 3. — Constraints on the slope, p, and normalization, mo, of 
the halo occupation number from the PSCz data. We show l-,2- 
and 3-sigma contours levels which are labeled as A\ 2 =2.3, 6.2 and 
11.8, respectively. 



tion number is sub-Poissonian at the low mass end. This 
is contrary to estimates we have obtained. In general, 
Scranton's blue galaxy model is better consistent with our 
estimates, though, a priori, there is no reason to believe 
why PSCz galaxies should be described by such a model. 
Assuming the second moment can be described by 

1/2 

a simple power law such that (N sa \(N sa \ — l)\m) = 
(m/mo) p , we fitted our band power estimates to constrain 



(mo , JJ^ When estimating the likelihood, we use the iull 
covariance matrix of our estimates of (N ga \(N ga \ — l)|m) . 
We show 1-, 2- and 3-sigma contours in figure 3. In gen- 
eral, our estimates are consistent with a wide range of 
values of m , from 10 10 to I0 13 , and p, from 0.2 to 1, at 
the 3-sigma level while power laws with slopes less than 
0.8 are consistent at the 2-sigma level. 

The present technique can be extended for higher order 
correlations as well. Note that in the non-linear regime, 
under the halo model, the p-point power spectrum, es- 
pecially in the case of equal length configurations with 
ki — k, can be written as 



(7V gal (Ar gal -l) (AT gal -n)|m) 

where n = p— 1. The inversion technique is effectively sim- 
ilar and can be used to determine related higher order mo- 
ment. ^ nf t.lio tralavy halo ruriipatirm nnmhpr And nnrp 



again J such an approach will require higher order p-point 
power estimates down to the highly non-linear regime and, 
more importantly, a proper accounting of associated er- 
rors. As always, we look forward to the day such a study 
can be carried out with observational measurements from 
ongoing and upcoming wide-field surveys such as the Sloan 
and many others. 



therein). A necessary, and an important, ingredient for a 
halo based clustering calculation involve a description of 
how galaxies populate dark matter halos or the so-called 
halo occupation number. We have raised the possibility for 
a model independent study on moments of the halo occu- 
pation number using an inversion of the non- linear cluster- 
ing power, and p-point, spectrum measurements. We have 
considered an application of this suggestion utilizing the 
PSCz power spectrum estimated by Hamilton & Tegmark 
(2002). Our estimates on the second moment of the halo 
occupation number are consistent with power law models 
over five decades in mass and with certain model descrip- 
tions in the literature. With expected increase in mea- 
surements of the clustering in the non-linear regime, and 
associated measurements of covariance, we expect analysis 
like the one suggested here will eventually make it possi- 
ble for a detailed understanding of the nature of galaxy 
occupation in halos. 
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4. SUMMARY 



The halo approach to large scale structure provides a 
physically motivated technique to study clustering prop- 
erties of galaxies (Cooray & Sheth 2002 and references 



